Kruskal-Wallis Test
The Kruskal-Wallis test is a non-parametric statistical test that is used to determine if there are statistically significant differences between the medians of three or more independent groups. It is an extension of the Mann-Whitney U test and is particularly useful when the assumptions of one-way ANOVA (such as normality) cannot be met.
Assumptions
The Kruskal-Wallis test relies on the following assumptions:
-
Independence of Samples: The groups are independent of one another.
-
Ordinal or Continuous Data: The data within and across groups should be ordinal or continuous.
-
Similarity of Shape: The distributions of the groups should be similar, allowing the medians to be comparable.
Hypotheses
The hypotheses for the Kruskal-Wallis test are as follows:
-
Null Hypothesis (H₀): The medians of all groups are equal.
-
Alternative Hypothesis (H₁): At least one group’s median is different from the others.
Calculation Steps
- Rank all data from all groups together; the lowest value gets rank 1, the next lowest rank 2, and so on.
- Calculate the sum of ranks for each group.
- Use the formula to calculate the H statistic.
Interpretation
A large value of H indicates a rejection of the null hypothesis. This value is compared against a chi-square distribution with \(k-1\) degrees of freedom. If the calculated H is greater than the critical value from the chi-square table at the desired level of significance, the null hypothesis is rejected.
Example Problem
Let’s consider an example where a researcher wants to compare the effectiveness of four different medications. The response scores from patients are as follows:
-
Medication A: 67, 75, 74, 70
-
Medication B: 70, 65, 76, 68
-
Medication C: 82, 85, 87, 83
-
Medication D: 60, 59, 61, 65
Hypotheses:
-
Null Hypothesis (H₀): The median response scores for all four medications are the same.
-
Alternative Hypothesis (H₁): At least one medication’s median response score is different from the others.
Kruskal-Wallis Test using Excel:
Download the Excel file link here
Kruskal-Wallis Test using R:
Code
# Data for the medications
med_a <- c(67, 75, 74, 70)
med_b <- c(70, 65, 76, 68)
med_c <- c(82, 85, 87, 83)
med_d <- c(60, 59, 61, 65)
# Combine into a list
data <- list(Medication_A = med_a, Medication_B = med_b, Medication_C = med_c, Medication_D = med_d)
# Perform Kruskal-Wallis test
kw_test <- kruskal.test(data)
# Print the results
print(kw_test)
Kruskal-Wallis rank sum test
data: data
Kruskal-Wallis chi-squared = 12.55, df = 3, p-value = 0.005719
Kruskal-Wallis Test using Python:
Code
from scipy.stats import kruskal
# Data for the medications
med_a = [67, 75, 74, 70]
med_b = [70, 65, 76, 68]
med_c = [82, 85, 87, 83]
med_d = [60, 59, 61, 65]
# Perform Kruskal-Wallis test
statistic, p_value = kruskal(med_a, med_b, med_c, med_d)
# Print the results
print("Kruskal-Wallis statistic:", statistic, "P-value:", p_value)
Kruskal-Wallis statistic: 12.54977876106195 P-value: 0.005718662446349043
This method allows for a robust analysis of variance when the data is not suited to traditional ANOVA, providing valuable insights in fields such as medicine, psychology, and ecological research.